Semi-supervised Discovery of Informative Tweets During the Emerging Disasters
نویسندگان
چکیده
The first objective towards the effective use of microblogging services such as Twitter for situational awareness during the emerging disasters is discovery of the disaster-related postings. Given the wide range of possible disasters, using a pre-selected set of disaster-related keywords for the discovery is suboptimal. An alternative that we focus on in this work is to train a classifier using a small set of labeled postings that are becoming available as a disaster is emerging. Our hypothesis is that utilizing large quantities of historical microblogs could improve the quality of classification, as compared to training a classifier only on the labeled data. We propose to use unlabeled microblogs to cluster words into a limited number of clusters and use the word clusters as features for classification. To evaluate the proposed semisupervised approach, we used Twitter data from 6 different disasters. Our results indicate that when the number of labeled tweets is 100 or less, the proposed approach is superior to the standard classification based on the bag or words feature representation. Our results also reveal that the choice of the unlabeled corpus, the choice of word clustering algorithm, and the choice of hyperparameters can have a significant impact on the classification accuracy. CCS Concepts •Information systems → Data analytics;
منابع مشابه
Weakly Supervised Classification of Tweets for Disaster Management
Social media has quickly established itself as an important means that people, NGOs and governments use to spread information during natural or man-made disasters, mass emergencies and crisis situations. Given this important role, real-time analysis of social media contents to locate, organize and use valuable information for disaster management is crucial. In this paper, we propose self-learni...
متن کاملMining User Intents in Twitter: A Semi-Supervised Approach to Inferring Intent Categories for Tweets
In this paper, we propose to study the problem of identifying and classifying tweets into intent categories. For example, a tweet “I wanna buy a new car” indicates the user’s intent for buying a car. Identifying such intent tweets will have great commercial value among others. In particular, it is important that we can distinguish different types of intent tweets. We propose to classify intent ...
متن کاملTweeting Behaviour during Train Disruptions within a City
In a smart city environment, citizens use social media for communicating and reporting events. Existing work has shown that social media tools, such as Twitter and Facebook, can be used as social sensors to monitor events in real-time as they happen (e.g. riots, natural disasters and sport events). In this paper, we study the reactions of citizens in social media towards train disruptions withi...
متن کاملLarge-Scale Inference of Network-Service Disruption upon Natural Disasters
Large-scale natural disasters cause external disturbances to networking infrastructure that lead to large-scale network-service disruption. To understand the impact of natural disasters to networks, it is important to localize and analyze network-service disruption after natural disasters occur. This work studies an inference of network-service disruption caused by the real natural disaster, Hu...
متن کاملA comparison between semi-supervised and supervised text mining techniques on detecting irony in greek political tweets
The present work describes a classification schema for irony detection in Greek political tweets. Our hypothesis states that humorous political tweets could predict actual election results. The irony detection concept is based on subjective perceptions, so only relying on human-annotator driven labor might not be the best route. The proposed approach relies on limited labeled training data, thu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1610.03750 شماره
صفحات -
تاریخ انتشار 2016